首页> 外文OA文献 >Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

机译:多元高斯分布混合的最小消息长度估计   和冯米塞斯 - 费舍尔分布



Mixture modelling involves explaining some observed evidence using acombination of probability distributions. The crux of the problem is theinference of an optimal number of mixture components and their correspondingparameters. This paper discusses unsupervised learning of mixture models usingthe Bayesian Minimum Message Length (MML) criterion. To demonstrate theeffectiveness of search and inference of mixture parameters using the proposedapproach, we select two key probability distributions, each handlingfundamentally different types of data: the multivariate Gaussian distributionto address mixture modelling of data distributed in Euclidean space, and themultivariate von Mises-Fisher (vMF) distribution to address mixture modellingof directional data distributed on a unit hypersphere. The key contributions ofthis paper, in addition to the general search and inference methodology,include the derivation of MML expressions for encoding the data usingmultivariate Gaussian and von Mises-Fisher distributions, and the analyticalderivation of the MML estimates of the parameters of the two distributions. Ourapproach is tested on simulated and real world data sets. For instance, weinfer vMF mixtures that concisely explain experimentally determinedthree-dimensional protein conformations, providing an effective null modeldescription of protein structures that is central to many inference problems instructural bioinformatics. The experimental results demonstrate that theperformance of our proposed search and inference method along with the encodingschemes improve on the state of the art mixture modelling techniques.
机译:混合建模涉及使用概率分布的组合来解释一些观察到的证据。问题的症结在于推断出最佳数量的混合组分及其相应的参数。本文讨论使用贝叶斯最小消息长度(MML)准则的混合模型的无监督学习。为了证明使用所提出的方法搜索和推断混合参数的有效性,我们选择了两个关键概率分布,每个分布都处理了根本不同的数据类型:用于处理在欧几里得空间中分布的数据的混合建模的多元高斯分布,以及用于处理混合数据的冯·米塞斯-费舍尔(vMF) )分布以解决在单位超球面上分布的定向数据的混合建模。除了通用的搜索和推理方法外,本文的主要贡献还包括使用多元高斯和冯·米塞斯-费舍尔分布推导用于编码数据的MML表达式,以及对两个分布参数的MML估计值的解析推导。我们的方法已在模拟和真实数据集上进行了测试。例如,我们引入了vMF混合物,该混合物简明地解释了实验确定的三维蛋白质构象,从而提供了有效的蛋白质结构无效模型描述,该描述对于许多指导性问题的指导性生物信息学至关重要。实验结果表明,我们提出的搜索和推理方法以及编码方案的性能在现有的混合建模技术上得到了改善。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号